Members
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Development of the French Discourse TreeBank (FDTB)

Participants : Laurence Danlos, Margot Colinet, Jacques Steinlin, Pierre Magistry.

FDTB1 is the first step towards the creation of the French Discourse Tree Bank (FDTB) with a discourse layer on top of the syntactic one which is available in the French Tree Bank (FTB). In this first step, we have identified all the words or phrases in the corpus that are used as “discourse connectives”. The methodology was the following: first, we highlighted all the items in the corpus that are recorded in LexConn [88] , a lexicon of French connectives with 350 items, next we eliminated some of these items with the following criteria:

  1. first, we filtered out the LexConn items that are annotated in FTB with parts of speech incompatible with a connective use, e.g. bref annotated as Adj instead of Adv, en fait annotated as Pro V instead of (compound) Adv;

  2. second, as we lay down for theoretical and pratical reasons that elementary arguments of connectives must be clauses or VPs, we filtered out e.g. LexConn prepositions that introduce NPs;

  3. last, we filtered out LexConn prepositions and adverbials with a non-discursive function.

The last criterion requires a manual work contrarily to the two others. For example the preposition pour (to), is ambiguous between a connective use (Fred s'est dépeché pour être à la gare à 17h (Fred hurried to be at the station at 17h)) and a preposition introducing a complement (Fred s'est dépeché pour aller à la gare (Fred hurried to go to the station)), and the disambiguation between the two uses is subtle and so the topic of a long paper [58] , whose results have been used to enhance Lefff [93] .

FDTB1 identifies 9 833 discourse connectives (among 18 535 sentences). This ressource is freely available and has been released in May 2015 [36] .

FDTB2 is the next step in the creation of the FDTB. It consists in annotating the arguments of the discourse connectives identified in FDTB1 as well as the senses of these connectives (senses expressed through a set of discourse relations). This resource is still worked on.